Acoustic Models Based on Non-uniform Segments and Bidirectional Recurrent Neural Networks

نویسنده

  • Mike Schuster
چکیده

In this paper a new framework for acoustic model building is presented. It is based on non-uniform segment models, which are learned and scored with a time bidirectional recurrent neural network. While usually neural networks in speech recognition systems are used to estimate posterior "frame to phoneme" probabilities, they are used here to estimate directly "segment to phoneme" probabilities, which results in an improved duration model. The special MAP approach allows not only incorporation of long term dependencies on the acoustic side, but also on the phone (output) side, which results automatically in parameter e cient context dependent models. While the use of neural networks as frame or phoneme classi ers always results in discriminative training for the acoustic information, the MAP approach presented here also incorporates discriminative training for the internally learned phoneme language model. Classi cation tests for the TIMIT phoneme database gave promising results of 77.75 (82.38)% for the full test data set with all 61 (39) symbols.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic model building based on non-uniform segments and bidirectional recurrent neural networks

In this paper a new framework for acoustic model building is presented. It is based on non-uniform segment models, which are learned and scored with a time bidirectional recurrent neural network. While usually neural networks in speech recognition systems are used to estimate posterior "frame to phoneme" probabilities, they are used here to estimate directly "segment to phoneme" probabilities, ...

متن کامل

Acoustic Modeling Using Bidirectional Gated Recurrent Convolutional Units

Convolutional and bidirectional recurrent neural networks have achieved considerable performance gains as acoustic models in automatic speech recognition in recent years. Latest architectures unify long short-term memory, gated recurrent unit and convolutional neural networks by stacking these different neural network types on each other, and providing short and long-term features to different ...

متن کامل

Application of artificial neural networks on drought prediction in Yazd (Central Iran)

In recent decades artificial neural networks (ANNs) have shown great ability in modeling and forecasting non-linear and non-stationary time series and in most of the cases especially in prediction of phenomena have showed very good performance. This paper presents the application of artificial neural networks to predict drought in Yazd meteorological station. In this research, different archite...

متن کامل

Improving English Conversational Telephone Speech Recognition

The goal of this work is to build a state-of-the-art English conversational telephone speech recognition system. We investigated several techniques to improve acoustic modeling, namely speaker-dependent bottleneck features, deep Bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks, data augmentation and score fusion of DNN and BLSTM models. Training set consisted of the 300 ho...

متن کامل

Deep Recurrent Neural Networks for Acoustic Modelling

We present a novel deep Recurrent Neural Network (RNN) model for acoustic modelling in Automatic Speech Recognition (ASR). We term our contribution as a TC-DNN-BLSTM-DNN model, the model combines a Deep Neural Network (DNN) with Time Convolution (TC), followed by a Bidirectional LongShort Term Memory (BLSTM), and a final DNN. The first DNN acts as a feature processor to our model, the BLSTM the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996